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Abstract 



The purpose of this paper is to propose a time-varying vector autoregressive model 
(TV-VAR) for forecasting multivariate time series. The model is casted into a state-space 
form that allows flexible description and analysis. The volatility covariance matrix of the 
time scries is modelled via inverted Wishart and singular multivariate beta distributions 
allowing a fully conjugate Bayesian inference. Model performance and model compar- 
^ ' ison is done via the likelihood function, sequential Bayes factors, the mean of squared 

^^—{ , standardized forecast errors, the mean of absolute forecast errors (known also as mean 

qh' absolute deviation), and the mean forecast error. Bayes factors are also used in order 

to choose the autoregressive order of the model. Multi-step forecasting is discussed in 
detail and a flexible formula is proposed to approximate the forecast function. Two ex- 
amples, consisting of bivariate data of IBM shares and of foreign exchange (FX) rates 
• for 8 currencies, illustrate the methods. For the IBM data we discuss model performance 

, and multi-step forecasting in some detail. For the FX data we discuss sequential portfolio 

' allocation; for both data sets our empirical findings suggest that the TV-VAR models 

, outperform the widely used VAR models. 

04 ' Some key words: Bayesian forecasting, multivariate time series, stochastic volatility, 

I state space, foreign exchange rates, portfolio allocation. 
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1 Introduction 



Over the past 30 years there has been an emerging hterature on multivariate time series 
5^ ; (Hamilton, 1995; Tsay, 2002; Lutkepohl, 2005). Multivariate time series forecasting is re- 

quired in many financial applications, for example to enable optimal portfolio allocation or 
to construct trading strategies over sectors of the market, or exchange rates. In this direction 
vector autoregressive models (Sims, 1980; Tiao and Tsay, 1989; Ni and Sun, 2003) are well 
established as flexible and useful multivariate models. A comprehensive treatment of these 
models can be found in Liitkepohl (2005). 

A recent advance in time series analysis is the development of autoregressive (AR) models, 
and more generally autoregressive moving average models, with time-varying coefficients. 
Such models are developed as in Kitagawa and Gersch (1985, 1996), Dahlhaus (1997), West 
et al. (1999), Prado and Huerta (2002), Andrieu et al. (2003), Lundbergh et al. (2003), 
Prancq and Gautier (2004), Moulines et al. (2005), Huerta and Prado (2006), Abramovich 
et al. (2007), Triantafyllopoulos and Nason (2007), and Zhu and Wu (2007). All these 
studies refer to univariate time series; attempts to model vector time series with time- varying 
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autoregressive (TV-VAR) models include Jiang and Kitagawa (1993), Sarantis (2006), Sato et 
al. (2007), and Triantafyllopoulos (2007). Although such models are capable of capturing the 
time- varying behaviour of time series data, it is desirable that multivariate stochastic volatility 
is included in the model, in particular for situations of financial time series forecasting. For 
example Uhlig (1997) uses a vector autoregressive (VAR) model that incorporates a stochastic 
volatility component, but its application is relatively limited due to the fact that one needs 
to resort to simulation-based estimation techniques (in particular Monte Carlo simulation). 
In a similar direction Aguilar and West (2000) describe the use of particle filters for portfolio 
allocation, but for forecasting and in particular for sequential multi-step forecasting, it is 
desirable to resort to analytic methods that are fast and computationally more stable than 
their simulation-based counterparts. 

Suppose that the px \ vector time series {yt}, which is observed at roughly equal intervals 
of time t = 1, . . . , A^, is generated by the autoregressive model 



yt = 4>ot + ^ityt~i^ ^^dtyt~d + ^t-, et Np{0,T,t), t > d, 



(1) 



where (pot is a p x 1 vector, ^jt is a p x p matrix {j = 1, . . . ,d), d is a positive integer 
(known as the lag order of the autoregression) , is a p x p covariance matrix, the sequence 
of innovations {et} is independent and et follows a p-variate Gaussian distribution. 

In VAR estimation, a common practice is to recast the model in VAR form of order 1, 
known as the reduced form, see e.g. Tiao and Tsay (1989). It is possible to extend this to 
our model ([T]), and write 
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which can be expressed as Yt = V'ot + "^tYt-i + Vt, where = [y^ y^^^ ■ ■ ■ yj_<i+i], V^ot = 
0' • • • 0'], and rj[ = [ej 0' • • • 0']. Now the model for {Yt} is a TV-VAR model of order 
1. Although it can be claimed that Yt possesses a more compact model as that of yt, in 
principle it is harder to derive multi-step forecasting equations for Yt, as the forecast horizon 
will be included in the dimension of Yt and this can cause technical problems. Furthermore, 
we notice that rjt has a singular covariance matrix (only p elements of rjt are non-degenerate) 
and so if one wishes to apply a prior distribution on the covariance matrix of rjt, this has 
to be a singular multivariate distribution, e.g. a singular Wishart distribution. As a result 
the above reduced formulation causes some technical difficulties. These difficulties may be 
reasonably overcome when p, the dimension of yt, and d, the AR order, are not too large, but 
if at least one of p or d are large, then the advantage of employing the reduced form model, 
seems to be lost. 

The aim of this paper is to suggest alternative state-space forms which will enable the 
modeller to overcome many of the difficulties mentioned above. In particular, this paper 
proposes Bayesian estimation for ([T]) after it is put in an appropriate state-space form. The 
proposed model includes a stochastic evolution that can update the volatility covariance ma- 
trix from time t — 1 to t. This evolutionary law is supported by inverted Wishart and singular 
multivariate beta distributions, following developments of Uhlig (1994) and Triantafyllopou- 
los (2008). We discuss multi-step forecasting for this model and we propose a simple formula 
that approximates the true forecast function. The choice of the autoregressive order is an 
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important problem (see e.g. de Waele and Broersen, 2003; or Abramovich et al, 2007), which 
here is resolved by using sequential Bayes factors as a means of model selection. For model 
judgment and model comparison we discuss several measures of goodness of fit (mean of 
squared standardized forecast errors, mean absolute forecast error, and mean forecast error) 
as well as a likelihood-based criterion and sequential Bayes factors. Two examples, consisting 
of bivariate data of IBM shares and foreign exchange (FX) rates of 8 currencies, illustrate 
the methods. For the IBM data set we discuss in detail model performance and multi-step 
forecasting; for the FX data we discuss a sequential portfolio selection strategy. For both 
data sets the time-varying model is found to outperform its time-invariant counterparts and 
as a result we propose that TV-VAR models are superior to VAR models. 

The rest of the paper is organized as follows. In the next section we define the model, 
which estimation and forecasting is discussed in Section [3l Sections H] and [5] include the data 
analyses and the paper closes with brief concluding comments. 

2 Model setting 

Consider model ([T|), with 0ot and generated by a random walk. This model postulates 
that (j)Qt — </'o.t-i ~ and — ^j.t~i ~ 0. Other evolutionary models may be considered, 
e.g. a Markovian evolution, but here we focus our attention on the random walk model. 
Write 

= [ (t>ot ^it ■■■ ^dt ] , 

where denotes the transpose matrix of $f so that is a {dp -|- 1) x p matrix and ^[ is a 
p X (dp + 1) matrix. 

Then we can express the above random walk as 

$t = + nt, (2) 

where we assume that the {dp + 1) x p random matrix fit follows a matrix-variate Gaussian 
distribution, i.e. Qt ~ -^(dp+i)xp(0, Wt, St). This implies that vec(f2t) follows the p{dp + 1)- 
variate Gaussian distribution vec{0,t) ~ -^p(dp+i)(0, St® Wt), where vec(.) denotes the column 
stacking operator of a matrix and ® denotes the Kronecker product. Here the {dp+1) x {dp+1) 
covariance matrix Wt is assumed known, but later we discuss its specification using several 
discount factors. It is also assumed that fit is independent of fi^ {t ^ s) and that for any t, g, 
r^t is independent of e^. 

Model ([l]) can be written in state-space form as 



+ 4 = F/cI.t + eJ, (3) 



where $t is generated by evolution ([2]). We use the notation for the information set up to 
time t = 1, . . . , A^, i.e. yt = {yi, yt). 

It remains to define an evolution for St. Here we use the model of Triantafyllopoulos 
(2008), which adopts the multiplicative evolutionary law of Uhlig (1994), based on the con- 
volution of the Wishart and the singular multivariate beta distribution. For the purpose of 



[1 y't~i ■ ■ ■ y't-d] 



It 



dt 



3 



this paper, we assume that T,t is a symmetric positive-definite matrix and we propose the 
fohowing law 

s-^ = kU{j:^\yBM^;-i), (4) 

where denotes the upper triangular matrix of the Choleski decomposition of 

Bt follows independently of a singular multivariate beta distribution with degrees of 
freedom ki = (n + p — l)/2 and k2 = 1/2, written Bt ~ B{ki,k2)- The singularity of this 
beta distribution is reflected on the fact that 2k2 = 1 < p — 1, and so the matrix Ip — Bf 
is singular. The singular multivariate beta density, which is defined on the Stiefel manifold, 
replaces the determinant of Ip — B^ (which is zero) by the product of the positive eigenvalues of 
that matrix; for more details on the beta distribution the reader is referred to Uhlig (1994). 
Here we set n = 1/(1 — /3), where /3 is a discount factor (0 < /? < 1) and k is defined as 
k = {[3{l—p)+p)/{(3{2—p)+p — l). The choice of the beta distribution and the choice of k are 
done so that resembles a random walk type evolution, i.e. E{T,'j~^\y^~^) = E{T,^\\y^~^) 
and Var(vecp(S^^)|y*~-'^) > Var(vecp(S^\)|y*~-^), where Var(.) denotes covariance matrix 
and vecp(.) denotes the column stacking operator of a covariance matrix. This simply says 
that going from time t — 1 to t, the expectation of the volatility remains unchanged, and the 
respective covariance matrix of the vectorized form of the volatility is increased. 

For t > d + 1, . . . , N, the model consists of equations ([2]), ^ and together with the 
priors 

'^d- N(^dp+i)xpN{md,Pd,^d) and J^-^ ^ Wp{n + 2p, Sd), (5) 

the latter of which, denotes a Wishart distribution with n + 2p degrees of freedom and 
parameter matrix Sd- The quantities md,Pd,Sd are assumed known. A weakly informative 
prior specification suggests = initial belief of ^d, Pd = lOOO/^p+i and Sd = Ip- The 
covariance matrix Wt-, which is responsible for the shocks of <&t, can be defined by using a 
discount matrix A, i.e. Var(vec($t)|Et_i, y*"^) = T^t-i ® Ru with Rt = A"^/2p^^^^-i/2 ^ 
Pt-i + Wt and Pt-i being known at time t — 1 so that Var(vec(<I>f_i)|Sf_i, y*~^) = Tit-i (8) 
Pt-i- The discount matrix A is the diagonal matrix of dp + 1 discount factors 6j, i.e. A = 
diag((5i, . . . , 5dp+i)- Similar discount models are discussed in West and Harrison (1997). The 
discount factors 5i, . . . ,5dp+\ and fj can be regarded as hyperparameters and they can be 
specified by considering goodness of fit criteria. 

We note that, although the above model is very general, subclasses of this include interest- 
ing and applicable models. For example, if = ■ ■ ■ = = 0, then we obtain a traditional 

1/2 

stochastic volatility model yt = (pot + E^ et and ej ~ Np{0,lp) (Triantafyllopoulos, 2008). If 
(pot = 4>, ^jt = ^ are all time-invariant, then we obtain a VAR model with stochastic volatility 
(Uhlig, 1997). If Et = E is time-invariant (this can be achieved by setting /3 = 1, but we 
need to replace n by = nt-i + 1 = riQ + t), then we obtain the model of Triantafyllopoulos 
(2007); such a model may be useful for very short periods of time (when the volatility can be 
assumed time-invariant), or for time series that do not exhibit a heteroscedastic behaviour 
(such data may arise in environmental studies). 

3 Estimation and forecasting 
3.1 Bayesian estimation 

Bayesian estimation follows from a generalization of the Kalman filter, the details of which 
can be found in Triantafyllopoulos (2008). Here we briefly describe the algorithm. Sup- 
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pose that at time t — 1, the posterior distributions of ^t-i, given Ht-i, and are given 
by ~ iV(dp+i)xp(mt_i,Pt_i,St_i) and St_i|y*~^ ~ IWp{n + 2p, St-i) (here 

IW{.) denotes the inverted Wishart distribution so that follows a Wishart distribution). 
Moving on to time t with information y*~^, we have <I>t|St_i, ~ -/V(dp+i)xp(^i-i5 -Rt; 
and Stly*"^ rsj IWp{Pn + 2p, fc~^S't_i), where Rt and /c are defined in the previous section. 
Then by observing yt, with information y*, we update the posteriors at t as <I>f|St,y* ~ 
N(dp+i)xp{m, Pt, St) and Etly* ~ IWp{n+2p, St), with mt = mt_i+iCteJ, Pt = Rt-KfK^/Qt, 
Kt = RtFt/Qt, Qt = FiRtFt + l,et = yt- K-i^t, and St = k-^St-i + ete'JQf Kt is called 
the adaptive vector (which plays a similar role as the Kalman gain in the standard Kalman 
filter), and et is the one-step ahead forecast error. At time t = d, starting with the priors ([S]), 
the above posterior distributions give a recursive algorithm for any t = d + 1, . . . , N . 

Given some data yi, . . . , y^r the likelihood function of S^+i, . . . , Sat can be derived by 



N 

-L(Dd+i,... jSat;?/^) = Y\_ Piyt\^t)p{^t\^t-i) 



t=d+l 

where p{yt\T,t) is the density function of yt conditional on S^, and p(Sj|Sj_i) is the density 
function of T,t, given 'St-i- From the Kalman filter we have yt\^t ^ Np{m'tFt,Tjt) and from 
the singular multivariate beta density of model ([1]), we obtain 

r(n/2) 

where n and k are defined in Section [21 and Lt is the diagonal matrix with elements the non- 
zero eigenvalues of the matrix Ip — k~'^{U(Ti~[\)')~'^Ti~[^U{T,~[\). This result is derived by the 
evolution of the volatility dH) and the singular multivariate beta density of Bt, details of which 
can be found in Triantafyllopoulos (2008). Then the log-likelihood function of S^+i, . . . , Sat 
is 

N _ ^ 

£(Srf+i,...,Sjv;y^) = C-- (yt-m;_iFt)'S7i(yt-mUFt) + ^ ^ log|St_i| 

t=d+l t=d+l 
N N 

log St - ^ 2^ log ivt , 



t=d+l t=d+l 



where 



Np, , Npin-p)^ , r„((n+l)/2) 
^log(27r2) ^' \ogk + N\og - " ^ 



2 ' 2 ° ° rp(n/2) 

From Stly* ~ IWp{n+2p, St), by replacing the posterior mean of St |y* in ^(S^+i, . . . , S^r; y^) 
we can evaluate the above log-likelihood function for the given data y^ = {yi, . . . , yn)- 



3.2 Stability of Pt 

In this section we show that if {yt} is a bounded sequence, then the sequence {Pt} is also 
bounded. This is an important property because if {Pt} were unbounded, then the estimates 
of $it and the forecasts of yt would be totally unstable, possibly having infinite variances. In 
particular, given data yi, ■ ■ ■ ,yN, we assume that ||yt|| < I, where || . || denotes the Eucledian 
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norm of a vector or matrix and I is some positive number. This in principle means that by 
writing yt = [yu • • • ypt]' , all yjt are bounded scalar time series. 

Since {yt} is bounded, {Ft} is also bounded and thus {F^F/} is bounded. From the 
recursion Pt = Rt — QtKtKj. we have Rf^ = R^^ + FtF[ (for all t > d) and so we can write 

i=0 i=0 

by adopting the weakly informative prior P^^ ?a 0. Then 

t-d t~d dp+1 

II ||< 2; II A^/^ fll Ft-iFU \\<mY,{5\ + ■■■ + 5^,+,) <mY, < 00, 

i=0 i=0 j=l ^ 

which shows that {Rt^^} is bounded. If R^^ ^ 0, one needs to add a factor in the bound of 
Rt, which again shows that P^"^ is bounded, except if P^ = 0. Thus, for any prior P^^ > 0, 
the sequence {Rf^} is bounded. Since Pf is non-singular, for any t > d, it follows that the 
sequence {Rt} is also bounded, for any prior P^ > 0. 



3.3 Choice of order d 

The performance of the model is usually judged against its forecast performance, which is 
best described by its forecast distribution. Thus sequential Bayes factors, which for univariate 
time series are described in West and Harrison (1997), can be used in order to compare and 
contrast models which differ in the order of the AR. Suppose we have two TV-VAR models, 
which have respectively lag orders di and ^2 and thus they consequently differ in the design 
vectors Fit and F2t, which are functions of di and d2- According to the above, the one-step 
forecast distributions are Student t, i.e. y(+i|y*, model i ^ tp{n,yit{l),Q*t{l)) and thus the 
Bayes factor of model 1 against model 2 is defined by 

^ piyt+i\y\model 1) ^ ( ( Pn + elt+iiQlta))-' ^w ] 

*^ ^ p(yt+i|y*,model2) VlQuWI/ [pn + e'^^,^,{QUl))~'e2,t+i J 

where yit{l) is the one-step forecast mean of yt+i (under model i), and Q*((l) is the scale 
matrix of the t distribution (see also the next section). Comparison of Ht{l) with 1 can 
indicate preference of model 1 (li Ht{l) > 1) or preference of model 2 (ii Ht{l) < 1), while if 
Ht{l) = 1, the two models are equivalent. Thus we choose the order that yields larger values 
in the respective forecast function. The above formula for the Bayes factor can also be used 
more generally for comparison of two models that differ in the values of the hyperparameters, 
e.g. in the discount factors, the advantage of the above scheme is that it can be applied 
sequentially and thus indicate preference of a model at each time point. This can lead to 
sequential model comparison and in the context of this section one can consider schemes 
where the order of the AR is time-varying, although this is not developed further here. 



3.4 Multi-step forecasting 

Forecasting commences by considering the density of ?/i+/i|y*, for some positive integer h, 
known as the forecast horizon. Denote with yt{h) the /i-step forecast mean of yt+h, i-e. 
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yt{h) = E(yt+h\y^) and with Qt{h) the respective /i-step forecast covariance matrix. Note 
that in the state-space formulation ([3]), Ft can be stochastic, as it includes values of yt- We 
denote with Ft+h the estimate of Ft+h by replacing the unknown values of yt^j by the known 
yt{j), for all j < h. Note that if h > d we have 

F,V,= [ 1 y',ih-l) ... y[{h-d) ] 

while if /i < d we get 

F/+,= [l y[{h-l) ... y',{l) y[ ... y[^^_,] 
From the evolution ([2]) of ^t we obtain 

h 

^t+h = ^t + ^^t+i- 
i=l 

From the posterior distribution $f we obtain ^t+h\>^t+h, y* ~ N{dp+i)xpi'mt, Rt{h),T.t+h) 

and so from y^^^ = -F^'^_^<l>t+fe + the /i-step forecast density of ytJ^-h can be approximated 

as yt+h\y^ ~ tpy,i{(3n,m[Ft+h, {Fl_^_f^Rt{h)Ft+h + l)k~^ St), where = -Pi + XliLi W^t+i and 

St-^-h = k~^St. Here tpxi(-) denotes the p-variate Student t distribution, from which it follows 

yt{h) = nif-Ft+h and Qt{h) = '^[3-2 ^' 

where (3 > 2/3. For h = 1, the forecast mean vector yt(l) and the forecast covariance matrix 
(5t(l) are exact, as = [1 y'f • • • which, given is not stochastic. 

We can define the /i-step forecast errors as et{h) = yt+h—yt{h), which approximately follow 
a t distribution, i.e. et{h)\y^ ~ tp^i{Pn,0,QUh)), where Q*t{h) = {F[^Mh)Ft+h + l)k-^St. 
We can then define the standardized /i-step forecast errors as ut (h) = {Q*t{h))~y^et{h), where 
{Qt{^))~^^'^ denotes the symmetric square root matrix of {Qt{h))~^ , so that, approximately, 
ut{h)\y^ ~ tpxi{(3n,0, Ip), from which we can derive 100a% quantiles and so we can build 
credible bounds for et{h). Finally, from E{et{h)\y^) = and Var(et(/i)|y*) = Qt{h), we can 
define standardized versions of et(/i) as vt (h) = {Qt{h))-^'^et{h) so that E{vt{h)\y^) = and 
Yax{vt{h)\y^) = Ip. Then, a measure of goodness of fit is the mean of squared standardized h- 
step forecast errors, defined as MSSE{h) = {N -d-h+l)'^ YH=d\'"it{^) ' ' ' '^Iti.h)]' , which, 
if the model fit is good should be close to the vector [1 • • • 1]'. A second measure of goodness 
of fit is the mean absolute forecast error, known also as the mean absolute deviation, defined 
3S MAE{h) = {N -d-h + l)-^Y.t=d\Mh)\ ■■■ \ept{h)\]',wheTeet{h) = [eu{h) ••• Cptih)]' 
and |.| indicates absolute value. Finally a third measure of goodness of fit is the mean forecast 
error, defined by ME[h) = [N — d — h + which can be used to measure 

how biased are the forecasts, the log-likelihood function (evaluated at the posterior mean of 
the volatility) and the Bayes factors (see also the previous section) can be used for model 
performance and model comparison. 

For the volatility, from the evolution Q we have -E(S^-^|y*) = £'(S^^|y*) and so we 
can find some symmetric random matrix r.t+i with zero mean so that '^i^i = + Ht+i, 
defining a random walk evolution for . This implies that = S^"*^ -|- "^1^=1 ^t+i and 

one way to support this evolution of is by defining = kl/{{T,^^y Bt+h^{T.^^), where 
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Figure 1: Log-returns for IBM shares and S&P 500 index. 



Bt^h follows, independently of T,^ , a singular multivariate beta distribution with degrees 
of freedom n/2 and 1/2 respectively. Then, from the Wishart distribution of S^^, we have 
'^t+h ^ ^piP^'T' + P ~ 1) ^'S^^) and so Sf+^Jy* IWp{Pn + 2p, k^^St). Then we can write 



4 Forecasting IBM shares and S&P 500 index data 

We consider bivariate time series data consisting of 888 log-returns of IBM shares and of the 
S&P 500 index. The data, which are plotted in Figure [TJ are collected in a daily frequency 
and they are discussed in Tsay (2002, Chapter 9). Here we propose the use of TV-VAR 
models with stochastic volatility in order to forecast time series and the volatility. Table [T] 
shows the log-likelihood function (evaluated at the posterior mean of Sj), the mean of squared 
standardized one-step forecast errors {MSSE{1)) and the mean absolute one-step forecast 
error {MAE{1)), for a variety of TV-VAR models. Here, for the evolution of ^u, we have 
used a single discount factor 5 so that A = SI^p+i = 5l2d+i- As 5 gets small (here S = 0.8), 
the performance of the models deteriorates. The largest log-likelihood function (for the given 
data) is obtained for the model with d = 2, 6 = 0.98, /? = 0.9, and it is indicated with boldface 
in the table. This model produces also decent values in the MSSE{1), being close to 1. It 
is worth mentioning that large values of the order of the TV-VAR d results in models that 
can approximate time-varying vector moving average models, e.g. here d = 50. However, 
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Table 1: Log-likelihood function (evaluated at the posterior mean of the volatility), mean 
squared standard one-step forecast error (^^^^^^(l)) and mean absolute one-step forecast 
error {MAE{1)), for several values of the parameters d, 5 and /3 of the TV-VAR model. 









MSSE(l) 


MAE(l) 


LL 


— 1 

— ± 


V — U.c/O 


R — 
P — 


n QQ 


— 99781 q 1 


f9 408 9 0461' 

^.'-t\J<J .\J'-t\J \ 


\7 375 6 3651' 








P - 


- n Q 




\o q7q n q4ql' 


\7 375 6 3651' 

1 .O 1 KJ \J -kjUkj \ 






— 8 


B — 
P — 




i( (J .i; W O O O W tJ t: 


fo 4q8 n 9q4l' 


fi 1 7q8 q 1 351' 

|X±. 1 iJO cf . ±kjO\ 








R - 
P - 


-HQ 


— 1 352 X 10"^° 

A. • KJ KJ ^ -L- \J 


\Q 044 0261' 


111 798 q 135 ' 

-L-L. 1 tJKJ tJ • J^KJKJ \ 


Li 


— 9 


— u.yo 


R — 
P — 


n QQ 


—10^491 


Ft qyn 1 i^q4l' 


fq Ofil 7 0091' 








R - 
P - 


-HQ 


-12614.11 


fn qqi n Q7f»l' 


fQ fifii 7 nn2l' 






— 8 


R — 
P — 


D QQ 


—991 979q9ns 


fo 478 n 1 311' 


fq S44 7 9731' 

\tJ.KJ^^ 1 J KJ\ 








R - 
P - 


- n q 

— u.y 


—2 103 X 10"^° 


rn 039 0131' 

\\J lyjKJ tJ KJ • \J -LKJ \ 


fq 344 7 273 ' 




— Q 

— O 


c — u.yo 


R — 
P — 


n qq 


— 1 57fi43 3 


\^ 54q 1 3881' 

. O'-tcf ±.000 \ 


\R 980 6 36ql' 








R - 
P - 


- n q 


—32741 4q 


\n 85q 8661' 


fs 280 6 36q ' 

KJ.^OVJ \J tKjKJiJ \ 






5 = 0.8 


3 = 


0.99 


—41 8'^4qfi1 1 ^ 


fo oq4 n 0491' 


\R 350 6 4961' 








fi-- 


= 0.9 


-3.042 X 10"^° 


[0.010 0.006]' 


[8.350 6.426]' 


d 


= 5 


<5 = 0.98 


P = 


0.99 


-282064.2 


[1.445 1.292]' 


[8.117 6.236]' 










= 0.9 


-109868.8 


[0.841 0.843]' 


[8.117 6.236]' 






5 = 0.8 


P = 


0.99 


-4700987042 


[0.057 0.023]' 


[8.146 6.258]' 








(3-- 


= 0.9 


-2.032 X 10"^° 


[0.007 0.003]' 


[8.146 6.258]' 


d-- 


= 10 


<5 = 0.98 


P = 


0.99 


-603785.9 


[1.328 1.199]' 


[7.704 5.886]' 








13-- 


= 0.9 


-306464.6 


[0.824 0.826]' 


[7.704 5.886]' 






5 = 0.8 


P = 


0.99 


-4162987498 


[0.049 0.009]' 


[7.715 5.893]' 










= 0.9 


-1.730 X 10"^° 


[0.005 0.001]' 


[7.715 5.893]' 


d- 


= 50 


6 = 0.98 


P = 


0.99 


-2786597 


[1.131 0.993]' 


[7.009 5.733]' 










= 0.9 


-1796252 


[0.814 0.796]' 


[7.009 5.733]' 






5 = 0.8 


P = 


0.99 


-3327435279 


[0.005 0.007]' 


[7.010 5.734]' 










= 0.9 


-5.694 X 10^8 


[0.007 0.001]' 


[7.010 5.734]' 



according to Tabled] all models with d = 50 are inferior to that model with d = 2, 5 = 0.98 
and f3 = 0.9. For 5 = 1 we obtain a time-invariant VAR model with stochastic volatility, 
but this is inferior to the TV-VAR models, even compared with the less favourable TV-VAR 
models using 6 = 0.8 (results not shown here). It turns out that a slow evolution of the AR 
matrices (corresponding to 5 = 0.98) produces the best results. 

In order to further compare the chosen model with other models, we use the Bayes factors 
as discussed in the previous section. We compare model 2 (d = 2) with models 1,3,4,5,6 
having respectively d = 1, 3, 5, 10, 50, where for all 6 models 6 = 0.98 and /3 = 0.9. Figure [2] 
shows 5 plots of the Bayes factors of model 2 against the other 5 models. Prom these plots 
the superiority of model 2 is clear; for most of the time points model 2 produces Bayes factors 
larger than 1. For each of the comparisons the mean of the Bayeas factor is larger than 1, i.e. 
3.487 (model 2 vs model 1), 1.688 (model 2 vs model 3), 2.083 (model 2 vs model 4), 14.965 
(model 2 vs model 5), and 9.021 (model 2 vs model 6). 

Figure [3] shows the one-step forecast of the volatility and the one-step forecast of the 
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Model 2 (d=2) vs model 1 (d=1) Model 2 (d=2) vs model 3 (d=3) Model 2 (d=2) vs model 4 (d=5) 
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Model 2 (d=2) vs model 5 (d=10) 
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Figure 2: Bayes factor for model 2 (d = 2) against models 1,3,4,5,6 {d = 1, 3, 5, 10, 50 respec- 
tively), all 6 models having J = 0.98,/3 = 0.9. 



correlation between the IBM shares and the S&P 500 index. The forecast of the correlation 
indicates that IBM shares are highly correlated with values of the S&P 500 index and that 
this correlation is dynamic. 

In the previous section it is shown that the sequence of the matrices {Pt} is bounded and 
this was a key argument on the stability of the forecast covariance matrix. Figure [4] plots 
the values of {Pt}, from which we can see that the diagonal elements of {Pt} are bounded 
above by 1, while the off-diagonal elements of {Pt} are bounded below by -0.030 and above 
by 0.007. However, we should note that, although {Pt} is bounded, it does not converge to a 
stable matrix P, something that is usual in dynamic linear models with time-invariant design 
components. 

One of the advantages of the proposed model is its capability to forecast future values of 
the time series vector, but also to forecast the volatility. Table [2] shows three performance 
measures of multi-step forecasting, ranging from h = 1 (one day forecast) to /i = 20 (twenty 
day forecast). As expected the forecast performance deteriorates as h increases, but the 
model is still usable for a 5-day forecast (corresponding to a weekly forecast, considering only 
trading days). The 20-day forecast (corresponding to a monthly forecast) produces very large 
MAE{1) and ME{1). It is interesting to note that throughout the range of h, the MSSE{\) 
retains reasonable values (being close to 1), where for h > 3 the forecast covariance matrix is 
slightly underestimated. 
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Volatility 



Correlation 





Figure 3: 1-step forecasts of the volatility (solid line for the IBM returns and dotted line for 
the S&P 500 index) and 1-step forecast for the correlation of the IBM returns and the S&P 
500 index. 



5 Sequential portfolio allocation: foreign exchange data 

We consider optimal portfolio allocation for FX data. The exchange rates are the Australian 
dollar (AUD), British pounds (GBP), Canadian dohar (CAD), German Deutschmark (GDM), 
Dutch guilder (DUG), French frank (FRF), Japanese yen (JPY) and Swiss franc (SWF), all 
expressed as number of units of the foreign currency per US dollar. The data, part of which 
are discussed in Franses and Dijk (2000), consist of 774 daily observations from 2 January 
1995 to 31 December 1997. Let xu be the exchange rate of currency i {i = 1,... ,8) and 
yt = [yit • • • yst]' be the 8-dimensional time series vector {t = 1, . . . , 773) consisting of the 
geometric returns yu = xu/xi^t-i — 1 (shown in Figure [5|) of the above exchange rates; yu 
being the return of AUD, . . ., ygt being the return of SWF. Here, as in Aguilar and West 
(2000), we have chosen to use geometric returns for the analysis, but log-returns can also be 
used (Soyer and Tanyeri, 2006). 

Sequential portfolio allocation aims to find at each time t an optimal allocation vector 
at, including the proportions of each currency to be invested, so that the expected return 
(^ift = m is constant, where ft = yt-i{l) is the one-step forecast mean of yt- Assuming no 
transaction costs and free reallocation of US dollars over all currencies, the realized return 
rt = a[yt can be used to judge the performance of the weights at. Several portfolio allocation 
rules can be considered for the evaluation of at, but here we adopt a sequential version of 
the Markowitz mean-variance optimization allocation, according to which at is chosen to 
minimize the variance of rt\y^~^, that is minimize Var(r( |y*~^) = a'tQtat, where Qt = Qt~i{^) 
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Stataility of matrix P 




Figure 4: Stability of matrix {Pt} = {pij,t}i,j=i,2'i shown are (left panel), {p22,t} (mid 

panel) and {pi2,t} (right panel). 



Table 2: Three performance measures {MSSE{h), MAE{h) and ME{h)) for multi-step 
forecasting (for several values of the forecast horizon h) using the TV-VAR model {d = 2,6 = 
0.98, /3 = 0.9), for the IBM data. 



MSSE{h) 



MAE{h) 



ME{h) 



h = 1 [0.991 0.975]' [9.061 7.002]' 

h = 2 [0.949 0.962]' [11.500 9.763]' 

/i = 3 [1.128 1.084]' [20.940 20.614]' 

h = A [1.468 1.221]' [59.796 73.719]' 

/i = 5 [1.360 1.481]' [259.919 369.380]' 

/i = 10 [2.034 1.826]' [1755316 2778512]' 

h=lh [1.828 2.265]' [14641429303 23426293367]' 

h = 20 [2.785 2.919]' [1.243997 x 10^^ 1.996699 x lO^'']' 



[0.224 0.378]' 
[0.205 1.458]' 
[2.681 7.323]' 
[24.114 43.938]' 
[170.750 294.774]' 
[1628008 2657889]' 
[14316186757 23117364799]' 
[1.235505 X 10^^ 1.988631 x lO^'']' 
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Figure 5: FX data; shown are the geometric returns yu = xa/xi^t-i — 1 of each of the currency 
i = l,...,8. 

is the one-step forecast covariance matrix of yt\y^~^- It is well known that the above allocation 
maximizes the expected return a[ft (if this is not set to be equal to m) given that the variance 
of rt is constant, i.e. a[Qtat = o"^, for some o"^ > 0. The unconstrained portfolio (UP) strategy 
computes the optimal weights as 

^ mQt^ft 
flQT'ft' 

The constrained portfolio (CP) allocation rule further constrains at to satisfy a'^lp = 
1, where Ip = [1 • • • 1]', and then effectively solves a constrained quadratic optimization 
problem. The solution is 

^ 1 / KQT'i'i-pm - ft) ft - flQt'apm - ft)lp \ 

* V a'pQ7%)iflQT'ft)-{KQT'ff)' J' 

Finally the naive equal weight portfolio (EWP) sets at = [1/p ■ ■ ■ l/p] so that each tut 
is allocated the same weight l/p, where yt = [yit ■•• Vpt]' and p > 2. Similar portfolio 
allocation strategies are discussed in Aguilar and West (2000), in Soyer and Tanyeri (2006) 
and in references therein. 

Table [3] shows the mean of the cumulative realized returns (%) for several time series 
models. We use TV-VAR models with m = 0.001, order d = 1,5,10,15, 6 = 0.96,1 and 
P = 0.97 (here as in the IBM example we use a single discount factor 6, i.e. A = SIdp+i)- 
m = 0.001 has been chosen from historical returns; a similar approach is adopted in Aguilar 
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Table 3: Mean of cumulative portfolio returns (%) under three allocation rules: constraint 
portfolio (CP), unconstraint portfolio (UP) and equal weight portfolio (EWP). A TV-VAR 
model is used with order d, discount factor 5, [3 = 0.97 and m = 0.001. 





CP (<5 = 0.96) 


UP {5 = 0.96) 


CP {5 = 1) 


UP {5 = 1) 


EWP 


d= 1 


7.459 


0.294 


-3.814 


-0.014 


-8.679 


(i = 5 


13.759 


0.073 


-3.910 


0.100 


-8.464 


d = 10 


36.918 


0.460 


-7.248 


0.118 


-8.602 


d = lh 


-20.607 


0.111 


-7.063 


0.083 


-8.848 



and West (2000). First we observe that, as expected, the EWP realizes loss throughout the 
models considered. Under the CP, the TV-VAR outperforms the VAR model, expect for 
d = 15. Under the UP, TV-VAR does again better (except for d = 5). We also note that 
under CP, all VAR models {6 = 1) return loss. It is clear that the TV-VAR model with d = 10 
and 5 = 0.96 produces the best results with both CP and UP being superior compared to the 
other models. Models with larger orders tend to be increasingly slow having a large amount 
of parameters to be estimated. 

For the chosen model with d = 10, Figure [6] shows the percentage realized cumulative 
returns. From this graph we observe a high realized portfolio profit using the CP; the UP 
achieves marginal profit, and the EWP realizes significant loss. The corresponding optimal 
portfolio weights at (for the CP) are shown in Figure [71 

6 Concluding comments 

In this paper we have proposed the time-varying vector autoregressive model (TV-VAR) 
for modelling multivariate time series. Adopting Bayesian inference, we propose the use of 
conjugate priors, resulting in fast and accurate computations. Doubt in the choice of the 
hyperparameters of the priors, can be eliminated by using relatively long time series data, 
which are usually available in financial time series. The model includes an important element 
of stochastic volatility, which makes the model realistic and very useful, as most of financial 
time series exhibit heteroscedasticity. The volatility is estimated using inverted Wishart 
and singular multivariate beta distributions, which form a fast and flexible multiplicative 
evolutionary law for the volatility. The proposed approach offers the advantage of combining 
in one model forecasting of the time series vector as well as forecasting of the volatility 
covariance matrix. Using bivariate data of IBM shares and FX data of 8 currencies we provide 
empirical evidence in favour of the TV-VAR. Our empirical findings suggest that TV-VAR 
models work better than the usual VAR models. We believe that time-varying models offer 
important advantages for financial time series forecasting and future research is likely to be 
devoted in this direction. 
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Figure 6: Portfolio allocation for the 8-variate time series of FX data; shown are the % 
cumulative returns using the constrained portfolio (solid line), the unconstrained portfolio 
(dashed line) and the equal weight portfolio (dotted line). 
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